Run Date: 19 August 2025
This reads the various directories for CSVs in different stages, checks the files and combines them. Issues with files or data are reported, and total counts of files and rows added are listed. There are different folders checked here in different phases of
Before stitching files together, review and correct raw data files (e.g. species CSV file), using the notebook L0_file_review.rmd
The code here saves the file in the Data folder in L0 in the
configuration of filepaths.R. see below
First, setup the folders and files from configuration and check them
File paths used for this run:
file_paths
$DATA_FOLDER
[1] "/Users/billspat/tmp/Avian-Interaction-Database-Working"
$SYNC_DATA_FOLDER
[1] "/Users/billspat/Library/CloudStorage/GoogleDrive-billspat@msu.edu/Shared drives/Avian_MetaNetwork/data/L0/avian_intxn_data"
$L0
[1] "/Users/billspat/tmp/Avian-Interaction-Database-Working/L0"
$L1
[1] "/Users/billspat/tmp/Avian-Interaction-Database-Working/L1"
There are three main data folders for data files in different states. For each state, collect the list of files and o
intxnsL0spcsv_file_group_name='checked'
csv_file_path = file.path(file_paths$L0, "species")
# filter or add in google drive files here
checked_file_list <- list.files(path = csv_file_path , pattern = ".*\\.csv",
full.names = TRUE)
paste(csv_file_path, ":", csv_file_group_name, "files to process", length(checked_file_list))
[1] "/Users/billspat/tmp/Avian-Interaction-Database-Working/L0/species : checked files to process 870"
Stitching Files together:
intxnsL0sp <- L0_stitch(
csv_file_list = checked_file_list,
csv_file_group_name = csv_file_group_name
)
# intxnsL0sp is a list with several items to track counts of things and files
# see L0_functions.R for details
[1] "Rows in stitched file: 26372"
Optionally examine the full stitched data frame:
[1] "Count of species after binding: 1228"
Comparison of Pre- and Post-Binding Values:
Comparison of Pre- and Post-Binding Values
Column: n_studies
Pre-Binding Summary:
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
0.000 1.000 1.000 1.572 2.000 36.000 6673
Post-Binding Summary:
Length Class Mode
0 NULL NULL
Pre-Binding Unique Counts:
all_values
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 22 24 25 36 <NA>
1244 12507 3084 1525 580 348 163 93 49 73 11 10 19 9 5 5 5 1 2 1 4 1 1 1 1 6673
Post-Binding Unique Counts:
< table of extent 0 >
Column: effect_sp1_on_sp2
Pre-Binding Summary:
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
-1.0000 -1.0000 0.0000 0.0414 1.0000 1.0000 37
Post-Binding Summary:
Length Class Mode
0 NULL NULL
Pre-Binding Unique Counts:
all_values
-1 0 1 <NA>
10341 4604 11433 37
Post-Binding Unique Counts:
< table of extent 0 >
Column: effect_sp2_on_sp1
Pre-Binding Summary:
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
-1.0000 -1.0000 0.0000 0.1418 1.0000 1.0000 38
Post-Binding Summary:
Length Class Mode
0 NULL NULL
Pre-Binding Unique Counts:
all_values
-1 0 1 <NA>
9049 4539 12789 38
Post-Binding Unique Counts:
< table of extent 0 >
intxnsL0sptempcsv_file_group_name='species_temp'
csv_file_path = file.path(file_paths$L0, "species_temp")
# filter or add in google drive files here
temp_file_list <- list.files(path = csv_file_path , pattern = ".*\\.csv",
full.names = TRUE)
paste(csv_file_path, ":", csv_file_group_name, "files to process", length(temp_file_list))
[1] "/Users/billspat/tmp/Avian-Interaction-Database-Working/L0/species_temp : species_temp files to process 39"
intxnsL0sptemp <- L0_stitch(
csv_file_list = temp_file_list,
csv_file_group_name=csv_file_group_name
)
paste("Count of species after binding",intxnsL0sptemp$count)
[1] "Count of species after binding 170"
Comparison of Pre- and Post-Binding Values
Column: n_studies
Pre-Binding Summary:
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
1.000 1.000 1.000 1.553 2.000 24.000 1168
Post-Binding Summary:
Length Class Mode
0 NULL NULL
Pre-Binding Unique Counts:
all_values
1 2 3 4 5 6 7 8 9 10 11 14 24 <NA>
1662 421 172 67 31 24 3 2 3 1 1 1 1 1168
Post-Binding Unique Counts:
< table of extent 0 >
Column: effect_sp1_on_sp2
Pre-Binding Summary:
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
-1.00000 -1.00000 0.00000 -0.06363 1.00000 1.00000 5
Post-Binding Summary:
Length Class Mode
0 NULL NULL
Pre-Binding Unique Counts:
all_values
-1 0 1 <NA>
1621 536 1395 5
Post-Binding Unique Counts:
< table of extent 0 >
Column: effect_sp2_on_sp1
Pre-Binding Summary:
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
-1.0000 0.0000 1.0000 0.4237 1.0000 1.0000 5
Post-Binding Summary:
Length Class Mode
0 NULL NULL
Pre-Binding Unique Counts:
all_values
-1 0 1 <NA>
722 603 2227 5
Post-Binding Unique Counts:
< table of extent 0 >
intxnsL0spircsv_file_group_name='species_in_review'
csv_file_path = file.path(file_paths$L0, "species_in_review")
# filter or add in google drive files here
in_review_file_list <- list.files(path = csv_file_path , pattern = ".*\\.csv",
full.names = TRUE)
paste(csv_file_path, ":", csv_file_group_name, "files to process", length(in_review_file_list))
[1] "/Users/billspat/tmp/Avian-Interaction-Database-Working/L0/species_in_review : species_in_review files to process 292"
intxnsL0spir <- L0_stitch(
csv_file_list = in_review_file_list,
csv_file_group_name=csv_file_group_name
)
G2;H2;Warningh: One or more parsing issues, call `problems()` on your data frame for details, e.g.:
dat <- vroom(...)
problems(dat)g
[1] "/Users/billspat/tmp/Avian-Interaction-Database-Working/L0/species_in_review/Cynanthus_ doubledayi_AJ.csv"
G2;H2;Warningh: One or more parsing issues, call `problems()` on your data frame for details, e.g.:
dat <- vroom(...)
problems(dat)g
[1] "/Users/billspat/tmp/Avian-Interaction-Database-Working/L0/species_in_review/Micrastur_ruficollis_CR.csv"
G2;H2;Warningh: One or more parsing issues, call `problems()` on your data frame for details, e.g.:
dat <- vroom(...)
problems(dat)g
[1] "/Users/billspat/tmp/Avian-Interaction-Database-Working/L0/species_in_review/Pampa_ curvipennis_AJ.csv"
G2;H2;Warningh: One or more parsing issues, call `problems()` on your data frame for details, e.g.:
dat <- vroom(...)
problems(dat)g
[1] "/Users/billspat/tmp/Avian-Interaction-Database-Working/L0/species_in_review/Rhynchocyclus_brevirostris_AJ.csv"
G2;H2;Warningh: One or more parsing issues, call `problems()` on your data frame for details, e.g.:
dat <- vroom(...)
problems(dat)g
[1] "/Users/billspat/tmp/Avian-Interaction-Database-Working/L0/species_in_review/Saucerottia_ beryllina_AJ.csv"
G2;H2;Warningh: The following named parsers don't match the column names: OLDsourceA, OLDsourceB, sourceAupdatedURL, sourceBupdatedURL, sourceCupdatedURL, sourceDupdatedURLg
G2;H2;Warningh: One or more parsing issues, call `problems()` on your data frame for details, e.g.:
dat <- vroom(...)
problems(dat)g
[1] "/Users/billspat/tmp/Avian-Interaction-Database-Working/L0/species_in_review/Streptoprocne_zonaris_AJ.csv"
G2;H2;Warningh: One or more parsing issues, call `problems()` on your data frame for details, e.g.:
dat <- vroom(...)
problems(dat)g
[1] "/Users/billspat/tmp/Avian-Interaction-Database-Working/L0/species_in_review/Tachycineta_ albilinea_AJ.csv"
G2;H2;Warningh: The following named parsers don't match the column names: OLDsourceA, OLDsourceB, sourceAupdatedURL, sourceBupdatedURL, sourceCupdatedURL, sourceDupdatedURLg
G2;H2;Warningh: The following named parsers don't match the column names: OLDsourceA, OLDsourceB, sourceAupdatedURL, sourceBupdatedURL, sourceCupdatedURL, sourceDupdatedURLg
[1] "Count of species after binding: 853"
Comparison of Pre- and Post-Binding Values
Column: n_studies
Pre-Binding Summary:
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
0.000 1.000 1.000 1.102 1.000 10.000 5817
Post-Binding Summary:
Length Class Mode
0 NULL NULL
Pre-Binding Unique Counts:
all_values
0 1 2 3 4 5 6 7 9 10 <NA>
266 6818 227 65 46 114 9 4 1 1 5817
Post-Binding Unique Counts:
< table of extent 0 >
Column: effect_sp1_on_sp2
Pre-Binding Summary:
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
-1.000 0.000 1.000 0.654 1.000 1.000 34
Post-Binding Summary:
Length Class Mode
0 NULL NULL
Pre-Binding Unique Counts:
all_values
-1 0 1 <NA>
898 2817 9619 34
Post-Binding Unique Counts:
< table of extent 0 >
Column: effect_sp2_on_sp1
Pre-Binding Summary:
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
-1.0000 0.0000 1.0000 0.6659 1.0000 1.0000 34
Post-Binding Summary:
Length Class Mode
0 NULL NULL
Pre-Binding Unique Counts:
all_values
-1 0 1 <NA>
775 2905 9654 34
Post-Binding Unique Counts:
< table of extent 0 >
Before saving the species to file, un-comment and run the following
to merge the species and species_in_review interaction data into checked
species. These are lists with the data frames in the element
intxns
# echo both in-review and checked
# intxnsL0 <-rbind(intxnsL0sp$intxns, intxnsL0spir$intxns)
# or, do not echo in-review, only checked
intxnsL0 <- intxnsL0sp$intxns
Count of species in saved L0:
[1] 26372
Note that some species1 in a given species1 csv could also be other species because of entering many pair-wise interactions in, for example, mixed flock entry. Any duplicates will be omitted later.
export the data to become the current L0 interaction data, overwrite any existing file
# change the name here to save a test file or overwrite previous version
intxns_file_name <- "AvianInteractionData_L0_test.csv"
L0_file <- save_L0_intxns(intxnsL0,
intxns_file_name,
L0_dir = file_paths$L0)
[1] "/Users/billspat/tmp/Avian-Interaction-Database-Working/L0/AvianInteractionData_L0.csv"